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Abstract 

EnterohemorrhagicEsc/7er/c/7/aco//(EHEC)026:H1 1/H~ is the predominant non-01 57 EHEC serotype among patients with diarrhea, 
bloody diarrhea, and hemolytic uremic syndrome (HUS) worldwide. To elucidate their phylogeny and association between their 
phylogenetic background and clinical outcome of the infection, we investigated 120 EHEC 026:1-11 1/hT strains isolated between 
1965 and 2012 from asymptomatic carriers and patients with diarrhea or HUS. Whole-genome shotgun sequencing (WGS) was 
applied to ten representative EHEC 026 isolates to determine single nucleotide polymorphism (SNP) localizations within a predefined 
set of core genes. A multiplex SNP assay, comprising a randomly distributed subset of 48 SNPs, was established to detect SNPs in 1 1 0 
additional EHEC 026 strains. Within approximately 1 Mb of core genes, WGS resulted in 476 high-quality bi-allelic SNP localizations. 
Forty-eight of these were subsequently investigated in 1 10 EHEC 026 and four different SNP clonal complexes (SNP-CC) were 
identified. SNP-CC2 was significantly associated with the development of HUS. Within the subsequently established evolutionary 
model of EHEC 026, we dated the emergence of human EHEC 026 to approximately 1 9,700 years ago and demonstrated a recent 
evolution within humans into the 4 SNP-CCs over the past 1 ,650 years. WGS and subsequent SNP typing enabled us to gain new 
insights into the evolution of EHEC 026 suggesting a common theme in this EHEC group with analogies to EHEC 01 57. In addition, 
the SNP-CC analysis may help to assess a risk in infected individuals for the progression to HUS and to implement more specific 
infection control measures. 
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Introduction 

Enterohemorrhagic Escherichia coli (EHEC) are a highly path- 
ogenic subgroup of Shiga toxin (Stx)-producing E. coli. In 
humans, EHEC infections cause watery and bloody diarrhea, 
hemolytic uremic syndrome (HUS) (Tarr et al. 2005; Mellmann 
et al. 2009), and is the most common cause of acute renal 
failure in children (Brandt etal. 1994; Kaplan 1998; Tarretal. 
2005). Although EHEC 0157:H7 is the serotype most com- 
monly associated with HUS worldwide (Banatvala et al. 2001 ; 
Robert-Koch-lnstitut 2008; Centers for Disease Control and 
Prevention [CDC] 2011), the large O104:H4 outbreak in 
Germany in spring 2011 (Bielaszewska et al. 2011; 
Mellmann et al. 2011) and several outbreaks caused by 
other non-0157 EHEC such as 026 (Bradley et al. 2012; 



Brown et al. 2012; L'Abee-Lund et al. 2012; Wahl et al. 

2011) attest to the potential menace of non-0157 EHEC. 
Among these, EHEC 026:H11/H~ (nonmotile) are the sero- 
types that are most frequently associated with severe human 
diseases in Europe (Gerber et al. 2002; Tozzi et al. 2003; 
Ethelberg et al. 2004; Espie et al. 2008; Mellmann et al. 
2008; Zimmerhackl et al. 2010; Kappeli et al. 2011; Buvens 
et al. 2012) and the United States (Jelacic et al. 2003; Brooks 
et al. 2005; Hedican et al. 2009). Furthermore, EHEC 026 has 
also been increasingly detected in South American (Rivas et al. 
2006), Asian (Hiroi et al. 2012), and Australian (Vally et al. 

2012) patients, demonstrating the global dissemination of 
this EHEC serogroup. Moreover, EHEC 026 infections can 
be comparable with EHEC 0157 infections in the severity of 
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the acute HUS and long-term sequelae (Gerber et al. 2002; 
Pollock et al. 201 1 ; Zieg et al. 201 2; Rosales et al. 201 2). 

The evolution of EHEC 01 57 was thoroughly characterized 
in step-wise evolutionary models, where EHEC 01 57 emerged 
from E. coli 055: H7 by loss and acquisition of virulence and 
phenotypic traits (Feng et al. 2007). This scenario was built on 
multilocus enzyme electrophoresis and multilocus sequence 
typing (MLST) data (Feng et al. 1998, 2007). Later, analyses 
based on multilocus variable number of tandem repeat anal- 
ysis (MLVA) and single nucleotide polymorphisms (SNPs) 
enabled a precise reconstruction of this model and further 
improved branching into different and evolutionary conserved 
subtypes (Leopold et al. 2009; Jenke et al. 2010, 2012). This 
approach allowed for assigning different rates of HUS to dif- 
ferent subtypes (Alpers et al. 2009; Jenke et al. 201 0). In con- 
trast, little is known about the evolution of EHEC 026. 
Recently, we identified a newly emerging, highly virulent 
clone within EHEC 026 based on MLST and specific virulence 
determinants (Bielaszewska et al. 2013); however, neither its 
evolutionary origin nor its reservoir is currently known. We 
therefore applied whole-genome shotgun sequencing 
(WGS) of representative EHEC 026 isolates from human dis- 
eases to develop an evolutionary model of this important 
pathogen and subsequently investigated the molecular epide- 
miology of a diverse European collection of EHEC 026 using a 
SNP-based assay. In addition, we examined whether the de- 
tected genotypes also reflect the presence of highly patho- 
genic clones within the population of EHEC 026 ultimately 
enabling a risk assessment in EHEC 026 infections to their 
progression to HUS. 

Materials and Methods 

Bacterial Isolates 

In total, we investigated 120 EHEC 026:H1 1/hT isolates (sup- 
plementary table S1, Supplementary Material online). All iso- 
lates were intimin (eae) positive and harbored either the Stx1 a- 
encoding gene (stx u ), Stx2a-encoding gene (stx 2a ), or both. 
Ten of the 120 isolates representing phylogenetic breadth 
based on isolation year and country (if applicable) were sub- 
jected to WGS for subsequent development of the evolution- 
ary model and the SNP assay. For evaluation of the model and 
investigation of the clinical association of genotypes to certain 
clinical outcomes, we included a representative subset of well- 
characterized clinical isolates from the previously published 
study (Bielaszewska et al. 201 3) and four isolates from asymp- 
tomatic carriers. Into this otherwise randomly selected subset 
of isolates, all isolates from countries other than Germany 
(n = 60) and the rare MLST sequence types (ST) STs396, 
591, 1565, 1566, and 1705 were included (for details see 
table S1, Supplementary Material online). The chromosomal 
sequence of 026:H1 1 strain 1 1368 (Ogura et al. 2009) (NCBI 
accession number NC_013361.1) served as reference. 



Whole-Genome Shotgun Sequencing, Sequence Analysis, 
SNP Discovery 

For WGS of the ten EHEC 026 isolates, sequencing libraries 
were prepared using the Nextera XT chemistry (lllumina Inc., 
San Diego, CA, USA) for a 1 00 bp paired-end sequencing run 
on an lllumina HiScanSQ sequencer in accordance to the man- 
ufacturer's recommendations (lllumina Inc.). Sequencing 
reads were assembled using the CLC bio Genomic 
Workbench reference assembler (CLC bio, Denmark) using 
the chromosomal sequence of the 026:H11 strain 11368 
(Ogura et al. 2009) as reference. For creation of a robust phy- 
logeny, we extracted the core genome open reading frames 
(ORF) sequences starting from the previously published list of 
core ORFs (Mellmann et al. 201 1) and included all ORFs that 
were present in all 026 isolates (n= 1,130). As an outgroup 
for phylogenetic analysis, the chromosomal sequence of EHEC 
01 1 1 :H~ strain 11128, NCBI acc. no. NC_01 3364.1 (Ogura 
et al. 2009), was used. SNPs were discovered by mapping the 
consensus sequence of the respective isolate against the 026 
reference sequence using the Ridom SeqSphere software ver- 
sion 0.99 beta (Ridom GmbH, Munster, Germany). 

Phylogenetic Analysis 

For inferring the evolutionary model of EHEC 026 based on 
core genome ORF sequences of the ten shotgun genome 
sequenced isolates and the reference strain 11368, a neigh- 
bor-joining tree was initially constructed using the MEGA5 
software with default parameters (Tamura et al. 2011). We 
concatenated the ORFs present in all isolates and calculated 
the Ks values according to the modified version of the Yang- 
Nielsen algorithm (MYN) by using the KaKs calculator 2.0 
(Zhang et al. 2006; Wang et al. 2010) for deeper analysis of 
the evolutionary history. This, together with an estimated syn- 
onymous substitution rate of 1 .44 x 10 -10 per base pair per 
generation (Lenski et al. 2003) and 300 generations per year 
for E. coli (in vivo) (Guttman and Dykhuizen 1994), we deter- 
mined the age intervals between two isolates: Ks/ 
([1 .44 x 1 0" 10 /bp]/generation x 300 generation/year). The 
concatenated sequences of the intermediate isolates (postu- 
lated ancestors) were defined by the sequences of the precur- 
sor isolate within the neighbor-joining tree. To portray the SNP 
data from all 1 20 isolates, we generated a minimum spanning 
tree using the Ridom SeqSphere software version 0.99 beta 
(Ridom GmbH). 

Bead-Based Multiplex Assay for SNP Detection 

For robustness, the discovered SNP localizations were divided 
into four multiplex sets (supplementary table S4, 
Supplementary Material online) for subsequent detection 
using MagPlex-TAG microspheres on the Luminex MAGPIX 
platform (Luminex Corp., Austin, TX). All 96 multiplex poly- 
merase chain reaction (PCR)-primers and 96 multiplex allele 
specific primer extension (ASPE) primers with the appending 
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of appropriate TAG sequence were designed for each set of 
SNPs (wild type and variant) with PrimerPlex 2.6 (PREMIER 
Biosoft International, Palo Alto, CA). For each SNP locus, 
one ASPE primer for the reference (wild type) and one ASPE 
primer for the allelic variant (variant) were designed. This 
double provision, in principle, ensures that in the SNP- 
screening of samples the allele calls once positively and once 
negatively. In this case, possible tri-allelic or even tetra-allelic 
polymorphisms would be found during the measurements. 
The multiplex PCR primers with the respective amplicon 
sizes and the multiplex ASPE-primers with capture sequences 
(TAG sequence) and corresponding bead number are shown 
in supplementary table S4, Supplementary Material online. 
The following procedure was performed with minor modifi- 
cations (discussed later) in accordance with the manufac- 
turer's recommendations (Song et al. 2010). Briefly, the 
multiplex PCR reaction for amplification of 12 loci per set 
(supplementary table S4, Supplementary Material online) 
was performed in 1 2.5 jllI containing 6.25 REDTaq 
ReadyMix(Sigma-Aldrich, Munich, Germany), 1 \i\ of each for- 
ward and reverse primer mix (5|iM), 3.25 uJ PCR water and 
1 jil template DNA extracted from a single fresh colony. The 
PCR cycling parameters consisted of an initial step at 80 °C for 
5 min, 30 cycles at 94 °C for 45 s, 60 °C for 45 s and 72 °C for 
60 s, and a final step at 72 °C for 1 0 min. Subsequently, 5 \i\ of 
the PCR product were purified using 1.5U Exonuclease I 
(£. coli) and 1.5U Shrimp Alkaline Phosphatase (Exo/SAP) by 
incubation at 37 °C for 45 min followed by an inactivation step 
at 80 °C for 15 min. For the ASPE reaction, a 20 uJ reaction 
contained 2 \i\ 1 0 x buffer, 1 uJ MgCI 2 (25 mM), 1 \i\ dNTP mix 
OOOuM dTTP, dGTP, dATP; Sigma-Aldrich), 0.25 ul biotin- 
dCTP (400 uWI, Invitrogen, Darmstadt, Germany), 1 \i\ ASPE 
primer mix (0.5 jiM), 0.75 \i\ AmpliTaq (5U/uJ, Applied 
Biosystems, Foster City, CA), 9uJ PCR-water, and 5 jllI purified 
PCR product. The cycling conditions were 96°C for 2 min, 
followed by 30 cycles at 94 °C for 30s, 60 °C for 60s and 
74 °C for 2 min. For the final hybridization step, the appropri- 
ate MagPlex TAG microspheres for each set of the 24 corre- 
sponding bead types (1 ,250 beads of each per reaction) were 
used. The hybridization mix was subjected to two washing 
steps and incubated in 1 x Tm Hybridization buffer containing 
4 jug/ml streptavidin-/?-phycoerythrin conjugate (SAPE) 
(Invitrogen, Darmstadt, Germany) at 37 °C for 15 min. 
Finally, the fluorescence was measured in 50 counts within 
1 minute using the xPonent 4.1 software (Luminex Corp.). 
Based on the median fluorescence intensity (MFI) and the 
net MFI, a SNP call was evaluated only if the following quality 
criteria were met: detection of >50 beads per bead type, 
an MFI > 300, and a ratio of MFI ca n ed a iieie/(MFI wi | d type aNe | e + 
MFI variant a neie) > 0.9 (Song et al. 2010). Measurements that 
did not fulfill these criteria were Sanger sequenced to validate 
the putative SNPs. For initial validation of this SNP assay, the 
putative SNP localizations of two samples (1226/65 and 3271/ 
00) that differed considerably in their SNP profile were Sanger 



sequenced. Moreover, reproducibility of the assay was evalu- 
ated by analyzing one 026 isolate (126814/98) starting from 
the extracted DNA in five independent replicates (Song et al. 
2010). 

Data and Statistical Analysis 

The Luminex MAGPIX analysis calculates the following values 
with the xPonent software: MFI, net MFI, Count, Allelic Call, 
and Allelic Ratio. The data were exported and further pro- 
cessed in Microsoft Excel. We tabulated the clinical picture 
(HUS, bloody diarrhea, diarrhea, asymptomatic, and unknown 
disease) to related SNP allelic profiles to evaluate whether cer- 
tain SNP genotypes were associated with a specific disease. 
For statistical analysis, we used Fisher's exact test of Epi Info 7 
(Centers for Disease Control and Prevention, Atlanta, GA). 
Data were evaluated as statistically significant with P 
values < 0.05. 

Results 

Whole-Genome Shotgun Sequencing of Ten 
Representatives EHEC 026 Isolates for SNP Discovery 

The SNP discovery was performed using WGS of ten EHEC 
026 isolates listed in supplementary table S1, Supplementary 
Material online. After assembly, we queried the assemblies for 
the 1,144 ORF sequences of previously published E. coli core 
ORF definition (Mellmann et al. 2011). In total, 1,130 of these 
1,144 ORFs were present in all 026 isolates and extracted for 
further analysis (supplementary table S2, Supplementary 
Material online). Within these 1,130 ORFs representing 
952,632 bp of the chromosome, we identified 476 SNPs (in 
298 ORFs) (supplementary table S3, Supplementary Material 
online). Of these, we selected in total 48 SNP localizations 
manually at random, 12 per quarter of the chromosome 
(fig. 1), to develop a multiplex SNP assay. All 48 SNPs were 
bi-allelic, synaptomorphic polymorphisms; of these 30 were 
nonsynonymous (ns) and 1 8 were synonymous (s), when com- 
pared with the reference sequence (table 1). 

SNP Typing of 120 026 Isolates Using the Multiplex 
Assay 

To achieve the desired robustness of the assay, we divided the 
multilocus genotyping assay into four sets comprising 12 SNP 
localizations each (supplementary table S4, Supplementary 
Material online). To test the assay accuracy, two isolates 
(1226/65 and 3271/00) were initially SNP genotyped and all 
alleles were Sanger sequenced. In all localizations, the SNP 
assay was concordant with the sequencing results. To test 
reproducibility, one strain (126814/98) was tested in five in- 
dependent repeats. Supplementary figures S1 and S2, 
Supplementary Material online, show the data of the average 
MFI minus background correction (net MFI) with standard de- 
viation of the different SNP alleles of the 48 investigated SNPs. 
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Fig. 1. — Distribution of the investigated 1,130 core genome ORFs (in 
green), of the discovered 476 SNPs (in blue), and the 48 SNPs of the 
multiplex assay (in red) illustrated in a circular map of reference genome 
of 026:1-111 strain 11368. 



In all cases, the SNP call was unambiguous and the MFI of 
called alleles was always at least 13-fold greater than the MFI 
of corresponding uncalled alleles. After this validation, a total 
of 5,760 SNPs were called in the 120 EHEC 026 strains. Only 
2.1 % (122 SNPs) had to be Sanger sequenced for confirma- 
tion, because the values were ambiguous. In all cases, how- 
ever, the failure was due to mutations in the binding region of 
ASPE primer and sequencing confirmed the missing SNPs as 
known alleles (table 1). 

Overall, SNP genotyping of the 120 EHEC 026 isolates 
resulted in ten unique SNP profiles. Their phylogenetic rela- 
tionships are displayed in a minimum spanning tree (MST) in 
figure 2. Clustering of SNP genotypes enabled us to assign 
SNP clonal complexes (SNP-CCs) as phylogenetically con- 
served groups, which is analogous to MLST, where MLST 
clonal complexes are phylogenetically informative groups 
(Feil and Spratt 2001). Isolates sharing >90% of the 48 
SNPs (i.e., >44 SNPs) were grouped, resulting in four different 
SNP-CCs (SNP-CC1 toSNP-CC4)(fig. 2). Further details of the 
48 investigated SNP localizations, for example, their ability to 
serve as a canonical SNP for a certain SNP-CC, are given in 
table 1 . Of the determined SNP-CCs, SNP-CC2 and SNP-CC3 
encompassed most isolates (60 [50.0%] and 39 [32.5%] 
isolates, respectively). The remaining SNP-CCs contained 
13 (10.1%, SNP-CC4) and 8 (6.7%, SNP-CC 1) isolates. 
Comparison of SNP data with MLST corroborated this sepa- 
ration as all isolates of SNP-CC2 and SNP-CC4 were exclu- 
sively MLST ST29 and ST21, respectively. Moreover, nearly all 



(36 of 39) SNP-CC3 isolates were ST21 and the majority of 
SNP-CC 1 (6 of 8 isolates) were ST29 with few single locus 
variants (slv) of either ST21 (ST591, ST1565, and ST1705 in 
SNP-CC3) or ST29 (ST396 and ST1566 in SNP-CC1). Taken 
together, SNP genotyping subdivided the EHEC 026 popula- 
tion into four different SNP-CCs, one of which (SNP-CC2) 
was the recently described highly pathogenic "new clone" 
that separated from the remaining 026 population 
(Bielaszewska et al. 2013) and three further SNP-CCs. 

Evolutionary Model of EHEC 026 

The phylogenetic topology of the ten representative EHEC 
026 isolates (supplementary table S1, Supplementary 
Material online), the 026:H11 reference strain 11368 and, 
as an outgroup, the next closely related EHEC serotype 
01 1 1 :H~ strain 11128 (NC_013364.1) (Ogura et al. 2009; 
Ju et al. 2012) are shown in a neighbor-joining tree (fig. 3). 
The branching within this tree was concordant to the separa- 
tion based on the SNP assay (fig. 2) underlining the unbiased 
representativeness of the selected 48 SNP localizations. We 
propose an evolutionary model of EHEC 026 with a subdivi- 
sion into four SNP-CCs. Using the Ks values (number of syn- 
onymous substitutions per synonymous site) of strains 1 1 128 
(EHEC 01 1 1) and A10, along with hypothetical intermediate 
isolates A01 to A10 and a common ancestor of E. coli 026 
and 01 1 1 as previously proposed (Whittam et al. 1988), we 
postulate that £ coli 026 and 01 1 1 separated 19,700 years 
ago (fig. 4). Since then, EHEC 026 likely developed sequen- 
tially from SNP-CC 1 to SNP-CC4. The evolution of these clonal 
clusters occurred within 1 ,650 years of this bifurcation (fig. 4). 
During this evolution, there was a parallel evolution of the 
core genome and stx as the most important virulence 
marker as both were almost exclusively associated in a fixed 
combination within the different SNP-CCs (fig. 4). 

Clinical Implications of EHEC 026 Separation into Four 
SNP-CCs 

Finally, we investigated whether the separation of EHEC 026 
into SNP-CCs is associated with human diseases by analyzing 
the association of the SNP genotypes with clinical outcomes of 
the infection. Indeed, isolates of SNP-CC2, which comprised 
50.0% of all strains and were responsible for 61 .8% all of (47 
of 76 strains) HUS cases in this study (table 2), showed a highly 
significant association (odds ratio 3.86, 95% confidence inter- 
val 1.63-9.30, P<0.01) with the development of HUS. In 
contrast, none of the other SNP-CCs exhibited a statistically 
significant association with HUS (table 2). 

Discussion 

To elucidate the evolutionary history of EHEC 026 and to 
analyze more precisely the differentiation of this globally 
emerging pathogen into groups with different genotypic 
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Table 1 

List of the 48 Synaptomorphic SNPs in 47 Loci in This Study Based on the Genome Sequence of 026:1-111 Strain 11368 (GenBank accession 
number NC_01 3361.1) 



Locus Tag 


Gene (No. of SNPs) 


Absolute SNP Position 


SNP 


SNP Effect 


SNP Allele Frequency 3 (%) 


ECO26_0083 


fruR (1) 


90659 


T^G b 


Synonymous 


78.33 


ECO26_0094 


murC (1) 


102806 


T^G 


Nonsynonymous (Tyr^Asp) 


5.83 


ECO26_0341 


ykgF (1) 


363280 


T^G C 


Synonymous 


56.67 


ECO26_0370 


prpC (1) 


398342 


T^C C 


Nonsynonymous (Tyr^ His) 


56.67 


ECO26_0554 


ybcF (1) 


598509 


A^G 


Nonsynonymous (Thr^Ala) 


5.83 


ECO26_0653 


ybdK (1) 


693357 


A^C b 


Nonsynonymous (Lys^Thr) 


89.17 


ECO26_0785 


sdhB (1) 


841192 


C^G d 


Nonsynonymous (Asp^Glu) 


6.67 


ECO26_0787 


sucB (1) 


844779 


T^G 


Synonymous 


10.00 


ECO26_0968 


ybjG (1) 


1020736 


T^A d 


Synonymous 


6.67 


ECO26J012 


cydC (1) 


1066493 


G^C C 


Nonsynonymous (Arg^Thr) 


56.67 


ECO26_1062 


ssuD (1) 


1135005 


T^G b 


Nonsynonymous (lle^Ser) 


89.17 


ECO26_1083 


ycbG (1) 


1157959 


A^G b 


Nonsynonymous (Asn -> Asp) 


89.17 


EC026_1434 


ptsG (1) 


1447293 


A^C 


Nonsynonymous (Asp -> Ala) 


100.00 


EC026J531 


hyaE (1) 


1528765 


C^T b 


Nonsynonymous (Ala^Val) 


89.17 


EC026J687 


minD (1) 


1652262 


A^G 


Synonymous 


100.00 


EC026_1741 


narH (1) 


1711223 


T^C b 


Nonsynonymous (Val^Ala) 


89.17 


EC026_1835 


yciK (1) 


1787173 


T^C 


Nonsynonymous (Met^Thr) 


100.00 


ECO26_1890 


ycjF (1) 


1844594 


G^ A b 


Nonsynonymous (Gly^Asp) 


89.17 


EC026_2286 


speG (1) 


2221328 


A^T C 


Nonsynonymous (Glu^Val) 


56.67 


EC026_2339 


fumC (1) 


2265373 


T^C b 


Nonsynonymous (Leu ->> Pro) 


89.17 


EC026_2367 


pdxH (1) 


2297263 


C^G C 


Synonymous 


56.67 


EC026_2432 


ydiA (1) 


2368018 


C^A d 


Nonsynonymous (Leu^ He) 


6.67 


EC026_2433 


aroti (1) 


2368309 


C^A 


Nonsynonymous (Pro^Thr) 


99.17 


EC026_2838 


rcsA (1) 


2735850 


A^G b 


Nonsynonymous (His^Arg) 


89.17 


ECO26_3081 


fruB (1) 


3018001 


T^C b 


Nonsynonymous (Val^Ala) 


89.17 


ECO26_3092 


yejE (1) 


3030971 


A^T b 


Nonsynonymous (Ser^Cys) 


89.17 


ECO26_3306 


truA (1) 


3232688 


G^T C 


Nonsynonymous (Arg -> Leu) 


56.67 


EC026_3433 


oxc (1) 


3349414 


G^A d 


Synonymous 


6.67 


EC026_3489 


yfeG (1) 


3410529 


T^G d 


Nonsynonymous (Cys^Gly) 


6.67 


EC026_3612 


recO (1) 


3551044 


T^A b 


Synonymous 


89.17 


EC026_3961 


ygeY (1) 


3914494 


A^G 


Synonymous 


88.33 


EC026_3979 


lyss (1) 


3939467 


G^A d 


Nonsynonymous (Val^ lie) 


6.67 


EC026_4164 


ttdB (1) 


4129239 


A^G 


Synonymous 


100.00 


ECO26_4280#1 


glmM (2) 


4253102 


T^C 


Synonymous 


98.33 


ECO26_4280#2 




4254024 


T^G C 


Synonymous 


56.67 


ECO26_4302 


kdsC (1) 


4272578 


T^C C 


Nonsynonymous (Val^Ala) 


56.67 


EC026_4343 


tldD (1) 


4314205 


T^C C 


Nonsynonymous (Cys^Arg) 


56.67 


EC026_4351 


yhdH (1) 


4326496 


G^T b 


Synonymous 


89.17 


EC026_4486 


y/ff (1) 


4446501 


A^C d 


Synonymous 


6.67 


EC026_4837 


g/'dA (1) 


4858567 


T^C 


Nonsynonymous (Val^Ala) 


90.00 


EC026_4917 


uhpT (1) 


4943021 


A^C b 


Synonymous 


89.17 


ECO26_5089 


secE (1) 


5143059 


A b 


Nonsynonymous (Asn -> Lys) 


89.17 


ECO26_5096 


rpoC (1) 


5151319 


T^C 


Synonymous 


100.00 


ECO26_5099 


t/7/G (1) 


5157488 


C^T 


Nonsynonymous (Leu -> Phe) 


90.00 


EC026_5139 


/ysc (1) 


5206902 


T^C b 


Nonsynonymous (Val^Ala) 


89.17 


EC026_5227 


adiC (1) 


5302486 


A^G b 


Nonsynonymous (Tyr^Cys) 


89.17 


ECO26_5302 


c//pZ (1) 


5389233 


C^G b 


Synonymous 


89.17 


EC026_5384 


cysQ (1) 


5467060 


T^G b 


Synonymous 


89.17 



a SNP allele frequency of 120 isolates used in this study. 
b canSNP for SNP-CC4. 
C SNP differentiating SNP-CC2 and SNP-CC3 
Canonical SNP (canSNP) for SNP-CC1. 
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SNP-CC1 



SNP-CC2 



Disease 




0HUS 


Q Asymptomatic 


£ Bloody diarrhea 


Q unknown 


Q Diarrhea 





SNP-CC3 




SNP-CC4 




1226/65 



2245/98 



Fig. 2. — The minimum spanning tree (MST) shows the molecular phylogeny of 120 EHEC 026 isolates. The different colors represent the symptoms of 
the infected patients. Each node represents a unique SNP profile. SNP clonal clusters (SNP-CCs) are numbered (SNP-CC1 to SNP-CC4). The node size reflects 
the number of isolates. Small numbers on connecting lines display the distance (number of differing SNPs) between two nodes. 



J 3319/99 (HUSEC017)| 
J5080/97 (HUSEC014)| 
[1557/77 (DEC1Qc)| 

- 1113681 
J 1226/65 1 



1 2245/98 (HUSEC013)| 

[T26814/98 (HUSEC015)] 

15028/97 (HUSEC016)| 

J1530/99(HUSEC018)| 
11588/98 (HUSEC019)| 



_ | 3271/00 (HUSEC020) | 
- 1111281 



SNP-CC4 



SNP-CC3 

SNP-CC2 
SNP-CC1 



0.0001 



Fig. 3. — Neighbor-joining tree based on 1,130 concatenated ORFs of ten representative isolates (gray) and reference strains 026:H11 (11368) and 
01 1 1 :H~ (1 1 128) (white). The SNP clonal clusters (SNP-CCs) are marked and demonstrate the quartering of the phylogenetic tree. Phylogenetic analysis 
generated by MEGA5 (Tamura et al. 201 1). 



and clinical characteristics, we applied WGS and SNP typing of 
a diverse collection of ten EHEC 026. Analysis of SNPs within 
1,130 core genome genes enabled us not only to develop a 
multiplex assay with a reduced number of SNP localizations for 
high-throughput grouping of EHEC 026 into distinct and phy- 
logenetically and clinically meaningful SNP-CCs but also to 
establish an evolutionary model of EHEC 026. 

We were surprised by the clear-cut separation of EHEC 026 
into four distinct SNP-CCs based on SNP data of the 120 
strains as our previous investigations based on MLST and vir- 
ulence profiling distinguished only a single highly pathogenic 
clone that was distinct from the remaining EHEC 026 popu- 
lation in Europe (Bielaszewska et al. 2013). However, this sep- 
aration corroborates with studies of EHEC 0157 (Manning 
et al. 2008; Leopold et al. 2009; Eppinger et al. 2011) and 
EHEC 01 04 (Brzuszkiewicz et al. 201 1 ; Mellmann et al. 201 1 ; 
Rasko et al. 2011), where genome sequencing information 
precisely refined and thereby proved evolutionary models. 



From an evolutionary perspective, different scenarios could 
explain the separation of EHEC 026 into distinct SNP-CCs. 
First, an evolutionary bottleneck could have led to a reduction 
of the EHEC 026 population into four major genotypes. As 
this must have occurred in a highly specific manner favoring at 
least four genotypes during the emergence of EHEC 026, this 
scenario is unlikely. Another explanation for the emergence of 
different EHEC 026 SNP-CCs is the evolutionary concept of an 
"epidemic" population structure (Smith et al. 2000). In this 
model, highly adaptive and frequently pathogenic clones arise 
from a recombining background population for a certain time 
period before they disappear again in the background popu- 
lation because of diversification predominantly driven by 
recombination and secondarily by point mutations (Smith 
et al. 2000). In addition to the fact that E. coli in general ex- 
hibits frequent recombination (Wirth et al. 2006), especially 
between closely related members of this species (Leopold 
et al. 201 1), it is also known that diversity and recombination 
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synonymous SNPs/nonsynonymous SNPs 




SNP-CC2 



SNP-CC3 



HUSEC014 




HUSEC017 


ST21, sfx, a+2a 




ST21, stx 1a+2a 



SNP-CC4 



19,700 years 



800 years 



300 years 



550 years 



Fig. 4. — Evolutionary model and calculated age distances for EHEC 026 pathogens based on the neighbor-joining tree (fig. 3) and inserted in the SNP 
clonal clusters (SNP-CC1 to SNP-CC4). Blue boxes are the EHEC 026 isolates with shotgun genome sequencing data. In gray, hypothetical founders of 026 
isolates are shown (A01 to A10). The ancestry is calculated based on the phylogeny displayed in figure 3. White boxes show the two EHEC 026: H1 1 and 
EHEC 01 1 1 :H" reference strains (strains 1 1368 and 1 1 128, respectively) that are fully sequenced; EHEC 01 1 1 :H" is assumed to be the closest relative of 
serogroup 026 (Whittam et al. 1988). Blue lines connect the isolates and the hypothetical ancestors and red numbers show the synonymous/nonsynon- 
ymous SNPs between these genotypes. The gray line connects the 01 1 1 :H~ reference strain and the first 026 ancestor A10 as the common 026/01 1 1 
ancestor is not known. Distances are not drawn to scale. 



Table 2 

Distribution of Diseases over Four Different SNP-CC of 120 EHEC 026 
Strains Isolated from Patients and Associations of EHEC 026 SNP-CCs 
with HUS 



SNP-CC (Total 


Disease 


OR (95% CI) (HUS) 


P Value 


No. Isolates) 


(HUS/BD/D/A/U) 






SNP-CC1 (8) 


5/0/2/0/1 


0.96 (0.19-5.41) 


0.96 


SNP-CC2 (60) 


47/0/11/1/1 


3.86 (1.63-9.30) 


<0.01 


SNP-CC3 (39) 


22/3/11/3/0 


0.65 (0.27-1.52) 


0.28 


SNP-CC4 (13) 


2/5/6/0/0 


0.08 (0.01-0.42) 


<0.01 



Note. — BD, bloody diarrhea; D, diarrhea; A, asymptomatic; U, unknown; OR, 
odds ratio; CI, confidence interval. 



within tightly constrained clones such as highly pathogenic 
EHEC is very limited (Noller et al. 2003; Wirth et al. 2006; 
Manning et al. 2008; Leopold et al. 2009), thus not favoring 
this model. The most likely model to explain this scenario 
could be the model of source-sink evolution dynamics that 
was introduced for bacterial pathogens by Sokurenko et al. 
(2006) and has been described for EHEC 01 57 (Leopold et al. 
2009) and uropathogenic E. coli (Chattopadhyay et al. 2007). 
This model postulates that a diverse population of EHEC 026 



has already circulated over a longer period of time in an evo- 
lutionary stable niche (source) and only few strains were able 
to adapt during the transfer into a new niche (sink) with pos- 
itive and purifying selection (Chattopadhyay et al. 2007). 
Calculation of the Ka/Ks value further corroborated this hy- 
pothesis by indicating significant purifying selection with a Ka/ 
Ks value of 0. 1 9 between the 026/01 1 1 ancestor and the first 
strain (A1 0) of SNP-CC 1 during the emergence of 026, that is, 
the transfer into the sink. 

This model immediately raises the question of the natural 
reservoir of EHEC 026. Although cattle are the known major 
reservoir of EHEC 026 (Blanco et al. 2004; Geue et al. 2009; 
Chase-Topping et al. 2012), highly pathogenic EHEC 026 of 
the new clone carrying solely stx 2a (Bielaszewska et al. 2013) 
have only rarely been isolated outside humans (Allerberger 
et al. 2003; Blanco et al. 2004; Chase-Topping et al. 2012). 
Future studies applying the described SNP multiplex assay on 
nonhuman samples are necessary to elucidate potential reser- 
voirs and to further understand the evolutionary dynamics 
between source and sink. 

By calculating the Ks value, we were also able to develop a 
timeline of the EHEC 026 evolution (fig. 4). Using EHEC 01 1 1 
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as the closest relative to EHEC 026 (Whittam et al. 1988; 
Ogura et al. 2009), we dated the separation from a 
common ancestor at approximately 19,700 years ago. 
Estimating 200 generations/year as done for EHEC 0157:H7 
(Leopold et al. 2009), we calculate that EHEC 026 separated 
from the common EHEC 026/0111 ancestor 29,600 years 
ago. Interestingly, like EHEC 0157 where two different 
groups emerged (sorbitol-fermenting 0157:hT [subgroup B] 
and nonsorbitol-fermenting EHEC 0157:H7 [subgroup C]) 
and of which the latter further differentiated over at least 
2,500 years, EHEC 026 also diverged into the four extant 
SNP-CCs over the course of 2,400 years. 

We further calculated the association of certain EHEC dis- 
ease entities (HUS vs. diarrhea without HUS and asymptomatic 
cases) with SNP-CCs to determine whether there is a clinical 
impact of the clustering into 4 SNP-CCs (table 2). Indeed, only 
SNP-CC2 was significantly associated with what the treating 
physicians termed HUS, underlining the clinical importance of 
the EHEC 026 grouping. Whether the infected individual 
finally develops a severe EHEC disease is of course also influ- 
enced by yet unknown host factors. Interestingly, the pres- 
ence of certain genotypes with an increased virulence is again 
similar to EHEC 01 57 (Manning et al. 2008; Jenke et al. 201 0, 
2012). Altogether, these observations suggest a common 
theme in the EHEC evolution, which is driven by the transfer 
into new hosts, that is, the humans, and rapid selection 
processes. 

One limitation of our study might be a potential sampling 
bias toward isolates from severely ill patients. However, SNP 
data of the diverse collection of strains spanning several 
decades and different countries still reflected this separation 
into four SNP-CCs. Moreover, also the four isolates from 
asymptomatic carriers shared the identical SNP genotype 
with isolates from severely ill patients further corroborating 
the separation into the four SNP-CCs. A second limitation 
might be the limited geographical distribution of the strains 
used to determine the evolutionary model as, with the excep- 
tion of one strain, all were isolated from patients in Germany. 
An inclusion of additional strains from different continents 
may provide additional information; however, grouping of 
the SNP genotypes of the 110 isolates from seven European 
countries into the four SNP-CCs also approved our model at 
least for Europe. Another limitation might be the inclusion of 
only 1,130 ORFs; we were, however, not aiming for separa- 
tion of closely related strains during outbreak investigations 
but used these genes that are conserved within E. coli solely 
for generation of a robust phylogenetic signal. 

In summary, based on WGS and subsequent multiplexed 
SNP calling, we established an evolutionary model of the 
emergence and further diversification of EHEC 026 into 
four phylogenetically and clinically meaningful SNP-CCs. 
These data broadened our knowledge about the evolution 
of this important human pathogen and suggest a common 
theme in the EHEC evolution. Moreover, information about 



the SNP-CC may help implement more specific infection con- 
trol measures and may enable a risk assessment for each de- 
tected EHEC 026 isolate. Future studies should also focus on 
EHEC 026 in nonhuman environments to understand their 
behavior and evolutionary processes in their likely reservoirs. 

Supplementary Material 

Supplementary tables S1-S4 and figures S1 and S2 are avail- 
able at Genome Biology and Evolution online (http:/A/wwv. 
gbe.oxfordjournals.org/). 
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